Overview

Dataset statistics

Number of variables17
Number of observations328272
Missing cells814010
Missing cells (%)14.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory42.6 MiB
Average record size in memory136.0 B

Variable types

Numeric8
Categorical8
Unsupported1

Alerts

Open has constant value "1.0" Constant
Date has a high cardinality: 577 distinct values High cardinality
Promo is highly correlated with OpenHigh correlation
StoreType is highly correlated with Open and 1 other fieldsHigh correlation
Promo2 is highly correlated with Open and 1 other fieldsHigh correlation
SchoolHoliday is highly correlated with OpenHigh correlation
Open is highly correlated with Promo and 5 other fieldsHigh correlation
PromoInterval is highly correlated with Promo2 and 1 other fieldsHigh correlation
Assortment is highly correlated with StoreType and 1 other fieldsHigh correlation
StoreType is highly correlated with AssortmentHigh correlation
Assortment is highly correlated with StoreTypeHigh correlation
CompetitionOpenSinceYear is highly correlated with Promo2SinceWeekHigh correlation
Promo2SinceWeek is highly correlated with CompetitionOpenSinceYear and 2 other fieldsHigh correlation
Promo2SinceYear is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
PromoInterval is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
DayOfWeek has 9890 (3.0%) missing values Missing
Open has 9966 (3.0%) missing values Missing
Promo has 9934 (3.0%) missing values Missing
StateHoliday has 10057 (3.1%) missing values Missing
SchoolHoliday has 9961 (3.0%) missing values Missing
StoreType has 10050 (3.1%) missing values Missing
Assortment has 10050 (3.1%) missing values Missing
CompetitionDistance has 10852 (3.3%) missing values Missing
CompetitionOpenSinceMonth has 111142 (33.9%) missing values Missing
CompetitionOpenSinceYear has 111142 (33.9%) missing values Missing
Promo2 has 10050 (3.1%) missing values Missing
Promo2SinceWeek has 166972 (50.9%) missing values Missing
Promo2SinceYear has 166972 (50.9%) missing values Missing
PromoInterval has 166972 (50.9%) missing values Missing
df_index is uniformly distributed Uniform
df_index has unique values Unique
StateHoliday is an unsupported type, check if it needs cleaning or further analysis Unsupported
Store has 10050 (3.1%) zeros Zeros

Reproduction

Analysis started2021-10-28 21:43:32.428299
Analysis finished2021-10-28 21:44:06.756532
Duration34.33 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct328272
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean256354.5956
Minimum1
Maximum512924
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 MiB
2021-10-28T23:44:06.873937image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile25552.55
Q1127906.25
median256458.5
Q3384805.5
95-th percentile487255.45
Maximum512924
Range512923
Interquartile range (IQR)256899.25

Descriptive statistics

Standard deviation148141.3797
Coefficient of variation (CV)0.5778768249
Kurtosis-1.201978714
Mean256354.5956
Median Absolute Deviation (MAD)128451
Skewness0.0004043840439
Sum8.41540358 × 1010
Variance2.194586839 × 1010
MonotonicityNot monotonic
2021-10-28T23:44:07.037352image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1431401
 
< 0.1%
767931
 
< 0.1%
3811421
 
< 0.1%
433351
 
< 0.1%
1423591
 
< 0.1%
1052981
 
< 0.1%
3958021
 
< 0.1%
603011
 
< 0.1%
4270661
 
< 0.1%
824131
 
< 0.1%
Other values (328262)328262
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
111
< 0.1%
121
< 0.1%
131
< 0.1%
ValueCountFrequency (%)
5129241
< 0.1%
5129221
< 0.1%
5129211
< 0.1%
5129201
< 0.1%
5129191
< 0.1%
5129181
< 0.1%
5129171
< 0.1%
5129151
< 0.1%
5129141
< 0.1%
5129131
< 0.1%

Date
Categorical

HIGH CARDINALITY

Distinct577
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size2.5 MiB
2013-03-26
 
745
2014-06-02
 
730
2013-03-02
 
728
2014-06-06
 
727
2014-05-21
 
727
Other values (572)
324615 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2013-06-12
2nd row2014-07-08
3rd row2014-04-08
4th row2014-02-17
5th row2013-10-05

Common Values

ValueCountFrequency (%)
2013-03-26745
 
0.2%
2014-06-02730
 
0.2%
2013-03-02728
 
0.2%
2014-06-06727
 
0.2%
2014-05-21727
 
0.2%
2013-12-07726
 
0.2%
2013-01-21726
 
0.2%
2014-04-28726
 
0.2%
2013-03-18724
 
0.2%
2013-05-04723
 
0.2%
Other values (567)320990
97.8%

Length

2021-10-28T23:44:07.173437image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2013-03-26745
 
0.2%
2014-06-02730
 
0.2%
2013-03-02728
 
0.2%
2014-06-06727
 
0.2%
2014-05-21727
 
0.2%
2013-12-07726
 
0.2%
2013-01-21726
 
0.2%
2014-04-28726
 
0.2%
2013-03-18724
 
0.2%
2013-05-04723
 
0.2%
Other values (567)320990
97.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Store
Real number (ℝ≥0)

ZEROS

Distinct1116
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean541.0298929
Minimum0
Maximum1115
Zeros10050
Zeros (%)3.1%
Negative0
Negative (%)0.0%
Memory size2.5 MiB
2021-10-28T23:44:07.305291image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile23
Q1253
median541
Q3828
95-th percentile1058
Maximum1115
Range1115
Interquartile range (IQR)575

Descriptive statistics

Standard deviation331.0234172
Coefficient of variation (CV)0.6118394224
Kurtosis-1.209007497
Mean541.0298929
Median Absolute Deviation (MAD)287
Skewness0.007933478021
Sum177604965
Variance109576.5027
MonotonicityNot monotonic
2021-10-28T23:44:07.455354image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
010050
 
3.1%
769356
 
0.1%
1097356
 
0.1%
85355
 
0.1%
274354
 
0.1%
335351
 
0.1%
578347
 
0.1%
524346
 
0.1%
733344
 
0.1%
512341
 
0.1%
Other values (1106)315072
96.0%
ValueCountFrequency (%)
010050
3.1%
1288
 
0.1%
2298
 
0.1%
3269
 
0.1%
4294
 
0.1%
5302
 
0.1%
6286
 
0.1%
7289
 
0.1%
8295
 
0.1%
9285
 
0.1%
ValueCountFrequency (%)
1115288
0.1%
1114297
0.1%
1113288
0.1%
1112299
0.1%
1111307
0.1%
1110290
0.1%
1109268
0.1%
1108293
0.1%
1107271
0.1%
1106274
0.1%

DayOfWeek
Real number (ℝ≥0)

MISSING

Distinct7
Distinct (%)< 0.1%
Missing9890
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean3.523484368
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 MiB
2021-10-28T23:44:07.576562image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile6
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.724473201
Coefficient of variation (CV)0.4894226909
Kurtosis-1.262991539
Mean3.523484368
Median Absolute Deviation (MAD)2
Skewness0.01511939346
Sum1121814
Variance2.97380782
MonotonicityNot monotonic
2021-10-28T23:44:07.669422image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
654460
16.6%
254313
16.5%
353127
16.2%
552673
16.0%
151826
15.8%
450675
15.4%
71308
 
0.4%
(Missing)9890
 
3.0%
ValueCountFrequency (%)
151826
15.8%
254313
16.5%
353127
16.2%
450675
15.4%
552673
16.0%
654460
16.6%
71308
 
0.4%
ValueCountFrequency (%)
71308
 
0.4%
654460
16.6%
552673
16.0%
450675
15.4%
353127
16.2%
254313
16.5%
151826
15.8%

Open
Categorical

CONSTANT
HIGH CORRELATION
MISSING
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing9966
Missing (%)3.0%
Memory size2.5 MiB
1.0
318306 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0318306
97.0%
(Missing)9966
 
3.0%

Length

2021-10-28T23:44:07.791293image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-28T23:44:07.870640image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
1.0318306
100.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Promo
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing9934
Missing (%)3.0%
Memory size2.5 MiB
0.0
180271 
1.0
138067 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
0.0180271
54.9%
1.0138067
42.1%
(Missing)9934
 
3.0%

Length

2021-10-28T23:44:07.947779image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-28T23:44:08.026437image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0.0180271
56.6%
1.0138067
43.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

StateHoliday
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing10057
Missing (%)3.1%
Memory size2.5 MiB

SchoolHoliday
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing9961
Missing (%)3.0%
Memory size2.5 MiB
0.0
258914 
1.0
59397 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0258914
78.9%
1.059397
 
18.1%
(Missing)9961
 
3.0%

Length

2021-10-28T23:44:08.106820image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-28T23:44:08.184996image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0.0258914
81.3%
1.059397
 
18.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

StoreType
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing10050
Missing (%)3.1%
Memory size2.5 MiB
a
171726 
d
98701 
c
42142 
b
 
5653

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowb
2nd rowc
3rd rowa
4th rowa
5th rowa

Common Values

ValueCountFrequency (%)
a171726
52.3%
d98701
30.1%
c42142
 
12.8%
b5653
 
1.7%
(Missing)10050
 
3.1%

Length

2021-10-28T23:44:08.266262image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-28T23:44:08.342389image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
a171726
54.0%
d98701
31.0%
c42142
 
13.2%
b5653
 
1.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Assortment
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing10050
Missing (%)3.1%
Memory size2.5 MiB
a
168388 
c
146757 
b
 
3077

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowb
2nd rowa
3rd rowa
4th rowc
5th rowa

Common Values

ValueCountFrequency (%)
a168388
51.3%
c146757
44.7%
b3077
 
0.9%
(Missing)10050
 
3.1%

Length

2021-10-28T23:44:08.428444image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-28T23:44:08.502378image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
a168388
52.9%
c146757
46.1%
b3077
 
1.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

CompetitionDistance
Real number (ℝ≥0)

MISSING

Distinct654
Distinct (%)0.2%
Missing10852
Missing (%)3.3%
Infinite0
Infinite (%)0.0%
Mean5437.019658
Minimum20
Maximum75860
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 MiB
2021-10-28T23:44:08.603821image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile130
Q1710
median2320
Q36880
95-th percentile20390
Maximum75860
Range75840
Interquartile range (IQR)6170

Descriptive statistics

Standard deviation7774.515789
Coefficient of variation (CV)1.429922325
Kurtosis13.53265719
Mean5437.019658
Median Absolute Deviation (MAD)1970
Skewness2.984334232
Sum1725818780
Variance60443095.75
MonotonicityNot monotonic
2021-10-28T23:44:09.022659image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2503486
 
1.1%
12002462
 
0.7%
3502324
 
0.7%
502305
 
0.7%
1902281
 
0.7%
902072
 
0.6%
1501979
 
0.6%
3301946
 
0.6%
1801930
 
0.6%
26401751
 
0.5%
Other values (644)294884
89.8%
(Missing)10852
 
3.3%
ValueCountFrequency (%)
20277
 
0.1%
301162
0.4%
401458
0.4%
502305
0.7%
60852
 
0.3%
701384
0.4%
80894
 
0.3%
902072
0.6%
1001442
0.4%
1101745
0.5%
ValueCountFrequency (%)
75860334
0.1%
58260309
0.1%
48330301
0.1%
46590315
0.1%
45740262
0.1%
44320299
0.1%
40860346
0.1%
40540298
0.1%
38710285
0.1%
38630332
0.1%

CompetitionOpenSinceMonth
Real number (ℝ≥0)

MISSING

Distinct12
Distinct (%)< 0.1%
Missing111142
Missing (%)33.9%
Infinite0
Infinite (%)0.0%
Mean7.229830056
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 MiB
2021-10-28T23:44:09.172252image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q14
median8
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.206326984
Coefficient of variation (CV)0.4434858025
Kurtosis-1.240109682
Mean7.229830056
Median Absolute Deviation (MAD)3
Skewness-0.1745438302
Sum1569813
Variance10.28053273
MonotonicityNot monotonic
2021-10-28T23:44:09.292360image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
936006
 
11.0%
427037
 
8.2%
1126230
 
8.0%
319820
 
6.0%
718866
 
5.7%
1218114
 
5.5%
1017556
 
5.3%
614311
 
4.4%
512436
 
3.8%
211564
 
3.5%
Other values (2)15190
 
4.6%
(Missing)111142
33.9%
ValueCountFrequency (%)
14009
 
1.2%
211564
 
3.5%
319820
6.0%
427037
8.2%
512436
 
3.8%
614311
 
4.4%
718866
5.7%
811181
 
3.4%
936006
11.0%
1017556
5.3%
ValueCountFrequency (%)
1218114
5.5%
1126230
8.0%
1017556
5.3%
936006
11.0%
811181
 
3.4%
718866
5.7%
614311
 
4.4%
512436
 
3.8%
427037
8.2%
319820
6.0%

CompetitionOpenSinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct23
Distinct (%)< 0.1%
Missing111142
Missing (%)33.9%
Infinite0
Infinite (%)0.0%
Mean2008.673822
Minimum1900
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 MiB
2021-10-28T23:44:09.418840image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile2001
Q12006
median2010
Q32012
95-th percentile2014
Maximum2015
Range115
Interquartile range (IQR)6

Descriptive statistics

Standard deviation6.089048579
Coefficient of variation (CV)0.003031377475
Kurtosis125.5519057
Mean2008.673822
Median Absolute Deviation (MAD)3
Skewness-7.805820394
Sum436143347
Variance37.0765126
MonotonicityNot monotonic
2021-10-28T23:44:09.558490image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
201323784
 
7.2%
201223423
 
7.1%
201419828
 
6.0%
200517736
 
5.4%
201015897
 
4.8%
201115618
 
4.8%
200915326
 
4.7%
200815230
 
4.6%
200713675
 
4.2%
200613442
 
4.1%
Other values (13)43171
 
13.2%
(Missing)111142
33.9%
ValueCountFrequency (%)
1900262
 
0.1%
1961286
 
0.1%
19901429
 
0.4%
1994569
 
0.2%
1995571
 
0.2%
1998283
 
0.1%
19992336
 
0.7%
20002861
 
0.9%
20014618
1.4%
20027751
2.4%
ValueCountFrequency (%)
201510645
3.2%
201419828
6.0%
201323784
7.2%
201223423
7.1%
201115618
4.8%
201015897
4.8%
200915326
4.7%
200815230
4.6%
200713675
4.2%
200613442
4.1%

Promo2
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing10050
Missing (%)3.1%
Memory size2.5 MiB
1.0
161300 
0.0
156922 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row1.0
4th row0.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0161300
49.1%
0.0156922
47.8%
(Missing)10050
 
3.1%

Length

2021-10-28T23:44:09.698827image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-28T23:44:09.773954image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
1.0161300
50.7%
0.0156922
49.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Promo2SinceWeek
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct24
Distinct (%)< 0.1%
Missing166972
Missing (%)50.9%
Infinite0
Infinite (%)0.0%
Mean23.48708617
Minimum1
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 MiB
2021-10-28T23:44:09.849579image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q113
median22
Q337
95-th percentile45
Maximum50
Range49
Interquartile range (IQR)24

Descriptive statistics

Standard deviation14.12971432
Coefficient of variation (CV)0.6015950303
Kurtosis-1.382649973
Mean23.48708617
Median Absolute Deviation (MAD)13
Skewness0.08531614575
Sum3788467
Variance199.6488267
MonotonicityNot monotonic
2021-10-28T23:44:09.974111image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
1423077
 
7.0%
4021176
 
6.5%
3112527
 
3.8%
1011981
 
3.6%
511211
 
3.4%
110002
 
3.0%
379982
 
3.0%
139556
 
2.9%
459487
 
2.9%
229243
 
2.8%
Other values (14)33058
 
10.1%
(Missing)166972
50.9%
ValueCountFrequency (%)
110002
3.0%
511211
3.4%
6279
 
0.1%
93920
 
1.2%
1011981
3.6%
139556
2.9%
1423077
7.0%
188225
 
2.5%
229243
2.8%
231398
 
0.4%
ValueCountFrequency (%)
50278
 
0.1%
49264
 
0.1%
482660
 
0.8%
459487
2.9%
44844
 
0.3%
4021176
6.5%
391629
 
0.5%
379982
3.0%
362767
 
0.8%
357120
 
2.2%

Promo2SinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing166972
Missing (%)50.9%
Infinite0
Infinite (%)0.0%
Mean2011.761711
Minimum2009
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 MiB
2021-10-28T23:44:10.078851image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12011
median2012
Q32013
95-th percentile2014
Maximum2015
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.667723406
Coefficient of variation (CV)0.0008289865527
Kurtosis-1.05326506
Mean2011.761711
Median Absolute Deviation (MAD)1
Skewness-0.1217980382
Sum324497164
Variance2.781301358
MonotonicityNot monotonic
2021-10-28T23:44:10.177528image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
201136226
 
11.0%
201334322
 
10.5%
201426326
 
8.0%
201223104
 
7.0%
200920530
 
6.3%
201017994
 
5.5%
20152798
 
0.9%
(Missing)166972
50.9%
ValueCountFrequency (%)
200920530
6.3%
201017994
5.5%
201136226
11.0%
201223104
7.0%
201334322
10.5%
201426326
8.0%
20152798
 
0.9%
ValueCountFrequency (%)
20152798
 
0.9%
201426326
8.0%
201334322
10.5%
201223104
7.0%
201136226
11.0%
201017994
5.5%
200920530
6.3%

PromoInterval
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing166972
Missing (%)50.9%
Memory size2.5 MiB
Jan,Apr,Jul,Oct
94280 
Feb,May,Aug,Nov
36854 
Mar,Jun,Sept,Dec
30166 

Length

Max length16
Median length15
Mean length15.18701798
Min length15

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFeb,May,Aug,Nov
2nd rowMar,Jun,Sept,Dec
3rd rowFeb,May,Aug,Nov
4th rowJan,Apr,Jul,Oct
5th rowJan,Apr,Jul,Oct

Common Values

ValueCountFrequency (%)
Jan,Apr,Jul,Oct94280
28.7%
Feb,May,Aug,Nov36854
 
11.2%
Mar,Jun,Sept,Dec30166
 
9.2%
(Missing)166972
50.9%

Length

2021-10-28T23:44:10.299317image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-28T23:44:10.377627image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
jan,apr,jul,oct94280
58.5%
feb,may,aug,nov36854
 
22.8%
mar,jun,sept,dec30166
 
18.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2021-10-28T23:44:01.400248image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:50.757012image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:52.373854image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:53.870329image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:55.386409image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:57.027525image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:58.540796image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:00.031618image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:01.734369image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:50.991032image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:52.556701image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:54.077386image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:55.595910image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:57.219150image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:58.735941image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:00.199395image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:01.895820image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:51.283636image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:52.743191image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:54.272376image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:55.792484image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:57.412283image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:58.928404image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:00.396317image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:02.088506image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:51.478190image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:52.932620image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:54.477492image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:56.124484image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:57.608061image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:59.128525image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:00.571015image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:02.254510image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:51.660377image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:53.110962image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:54.660376image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:56.331271image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:57.799551image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:59.317278image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:00.726928image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:02.415846image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:51.832693image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:53.291896image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:54.835847image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:56.511131image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:58.000075image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:59.501896image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:00.889747image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:02.605565image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:52.002606image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:53.468306image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:55.013178image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:56.681380image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:58.170697image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:59.678759image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:01.065047image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:02.773846image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:52.169330image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:53.645655image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:55.178046image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:56.843333image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:58.335249image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:43:59.842324image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T23:44:01.227351image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2021-10-28T23:44:10.483632image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-28T23:44:10.714218image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-28T23:44:10.935366image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-28T23:44:11.141905image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-10-28T23:44:11.311137image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-10-28T23:44:03.098320image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-10-28T23:44:03.823566image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-10-28T23:44:05.937282image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-10-28T23:44:06.394131image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexDateStoreDayOfWeekOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoInterval
01431402013-06-123533.01.00.000.0bb900.0NaNNaN1.014.02013.0Feb,May,Aug,Nov
14943002014-07-082672.01.00.000.0ca2460.01.02012.00.0NaNNaNNaN
24172492014-04-087342.01.00.00.00.0aa220.0NaNNaN1.036.02013.0Mar,Jun,Sept,Dec
33700252014-02-1701.01.01.00.00.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
42493732013-10-05510NaN1.0NaN00.0ac8260.0NaNNaN0.0NaNNaNNaN
5605132013-03-085065.01.01.000.0aa1850.012.02014.01.018.02011.0Feb,May,Aug,Nov
61919722013-08-033336.01.00.0NaN0.0ac3720.02.02010.00.0NaNNaNNaN
71703572013-07-114874.01.00.001.0dc2180.09.02012.01.040.02012.0Jan,Apr,Jul,Oct
84624232014-05-311546.01.00.000.0dc16420.0NaNNaN0.0NaNNaNNaN
93880812014-03-084016.01.00.00.00.0ac9200.010.02009.01.014.02012.0Jan,Apr,Jul,Oct

Last rows

df_indexDateStoreDayOfWeekOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoInterval
3282623497792014-01-258936.01.00.000.0aa130.0NaNNaN1.01.02013.0Jan,Apr,Jul,Oct
3282634738732014-06-137325.01.00.000.0ac35280.0NaNNaN0.0NaNNaNNaN
3282642716112013-10-298412.01.00.001.0aa27650.08.02004.00.0NaNNaNNaN
3282654222672014-04-148701.01.01.00.01.0aa780.04.02009.00.0NaNNaNNaN
3282663276482013-12-318332.01.00.001.0dc3290.012.01999.01.035.02010.0Mar,Jun,Sept,Dec
3282671586682013-06-28455.01.00.000.0da9710.02.02014.00.0NaNNaNNaN
3282682549622013-10-1105.01.01.000.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
3282694211192014-04-129456.01.00.00.00.0ac12480.03.02011.00.0NaNNaNNaN
328270399082013-02-143884.01.00.000.0aa2260.0NaNNaN0.0NaNNaNNaN
3282712934682013-11-221965.01.01.000.0ca3850.011.02005.01.014.02011.0Jan,Apr,Jul,Oct